Statistics of cluster multiplicity and the nature of 
UHE cosmic ray sources 



> 



Diego Harari, Silvia Mollerach and Esteban Roulet 
CONICET, Centro Atomico Bariloche, 
Av. Bustillo 9500, Bariloche, 8400, Argentina. 



O 
O 

<N 

February 2, 2008 

in 



Abstract 

We study in detail the properties of clusters of ultra high energy cosmic 
■^j- ' ray events, looking in particular to their angular correlation function, to the 

relative frequency of clusters with different multiplicities (e.g. doublets vs. 
^-j. . triplets or quadruplets), as well as the way in which these quantities should 

CD ! evolve for different ultra high energy cosmic rays source scenarios as a function 

of the experimental exposure achieved. We identify some useful tools which 
can help to characterise the nature of the cosmic ray sources in a more precise 
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way after a modest increase in statistics will be achieved in the very near 
future, even before strong signals from individual sources become eventually 
observable. 



1 Introduction 

>3 

In spite of the big effort done over the last decades to unravel the origin and nature 
of the cosmic rays (CRs) at ultra high energies (UHE), most of the relevant questions 
related to them remain still unanswered [1]. Certainly the main obstacle has been 
the scarcity of data, which is due to the smallness of the fluxes at the highest energies 
(e.g. ~ l/km 2 yr for E > 10 19 eV). The observed isotropy of CRs above the ankle 
(i.e. for E > 5 x 10 18 eV), suggests that at these energies CRs are of extragalactic 
origin, but the nature of these particles (whether they are protons, heavier nuclei, 
photons or even neutrinos with stronger interactions), the nature of their sources 
(e.g. steady ones like active galactic nuclei jets, transient ones like gamma ray 
burst (GRBs) or diffuse ones like decaying topological defects) and the mechanism 
responsible for their acceleration in bottom-up scenarios or production in top-down 
ones (and hence the original source spectra) are all under strong debate. All these 
questions are further challenged by the (lack of) observation of the theoretically 
expected GZK suppression above ~ 6 x 10 19 eV, due to the attenuation that CR 
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protons coming from far away sources should suffer due to their interactions with 
CMB photons (or similarly due to the photodisintegration processes that would 
affect the propagation of heavier nuclei). Clearly a big boost in our understanding 
of these issues could be obtained if individual CR sources were identified, and this 
search is indeed the main purpose of the next generation of observatories, like the 
Pierre Auger one now under completion. 

There has been a lot of discussion during the last years about possible hints in the 
existing data of an excess of clustering in the highest energy range (E > 4 x 10 19 eV) , 
that could be already indicating the possible emergence of the first strong CR sources 
out of a more diffuse background of unresolved sources. This excess clustering (with 
respect to the expectations from chance coincidences of an isotropic distribution) has 
been observed by the AGASA group [2] (which detected 4 doublets and one triplet 
within 2.5°, out of a total of 57 events 1 ) and also in a combined analysis of data from 
several experiments [3] (Yakutsk, Haverah Park, AGASA and Volcano Ranch, which 
obtained 8 doublets and 2 triplets within 4° out of a total of 92 events), although it 
does not seem to be present in Fly's Eye or Hi Res data [4]. 

In this work we will study in detail the properties of the event clusters expected in 
different scenarios of UHECR sources, with the aim of devising the most appropriate 
tools to discriminate among them. The results we obtain suggest that a modest 
increase in the present statistics (as could be achieved after one year of AUGER 
observations) will allow to clarify significantly the existing debate, even before any 
strong (and hence unambiguous) signal from an individual CR source is observed. 

2 Clusters from chance coincidences 

Before considering the possibility that the clusters are due to strong CR sources 
with large enough fluxes so as to have induced more than one event with the present 
experimental exposures, it is important to consider the expected clustering signal 
from just chance coincidences in the limit in which all events observed were produced 
by independent sources, which are too weak to produce by themselves an individual 
cluster (i.e., considering that there is a very large number of sources, all with a 
small chance of producing even a single event in the present experiments). As we 
show below, this probability depends strongly on the way the sources are distributed 
across the sky, and so we start with the simplest case of an isotropic distribution. 

2.1 Isotropic sources 

A commonly adopted measure of a clustering signal is obtained from the study of 
the angular correlation of pairs, dN p /d8, which gives the number of event pairs with 

1 We are not including in our discussion one event with energy slightly below 4 x 10 19 eV, 
included by AGASA because it lead to an extra doublet. 
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angular separation between 9 and 9 + d9 expected from a total number n of events. 
In the case of an isotropic distribution, it can be written as 



dK 



,/ oc sin# / dfii / dQ 2 utSMh) (Jffix • Q 2 - cos#), (1) 
d9 J J 

where dJl, = cos5jd5jdaj in terms of declination and right ascension, and <5j has to 
be integrated over the declinations observable to the experiment. The normalization 
is such that the total number of pairs (i.e. integrating over 9) is just n{n — l)/2. 

The function w(5) takes into account the declination dependence of the exposure 
of the experiment under consideration, and for a surface detector 2 is given by [5] 

u(S) oc cos On cos S sin a m + a m sin 6 sin 5, (2) 

where 60 is the latitude of the experiment (e.g. b = 35.8° for AGASA) and 
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with # m the maximum zenith angle of the CR showers considered (8 m = 45° for the 
AGASA sample). 

Potential point sources of UHECRs should manifest themselves through a sig- 
nificant enhancement above random expectations of the correlation function at 
values around and below the angular reconstruction accuracy. Energy- dependent 
deflections of CR trajectories by intervening magnetic fields may disperse the ar- 
rival directions, and thus clustering signals may be spread over larger angular 
scales. It thus proves convenient to analyse the integrated correlation function 
N p (< 6) = J e dd' dN p /d8', which is simply the total number of event pairs which 
are separated by an angle smaller than 6 (albeit the most common practice is to 
look instead at the distribution diVp/d#). The number of pairs at angles smaller 
than 9 is plotted in the upper panel in Figure 1, for a total number of 57 events and 
for the AGASA exposure, and is compared with the AGASA data. The solid line 
follows the mean value of 10000 simulated isotropic random realizations. The error 
bars contain 95% of the results in the random realizations at each angular scale (the 
larger and smaller 2.5% portions were discarded). The points are the AGASA data. 
An excess of pairs with respect to random expectations is apparent at small angles 
(9 < few degrees), which is related to the observation of 4 doublets and one triplet 
(corresponding to three additional pairs) at separations smaller than 2.5° in the 
AGASA data, while on average 1.45 doublets would be expected. The probability 3 
of getting seven or more pairs at 9 < 2.5° in the random realizations is ~ 1.5 x 10" 3 . 

2 For the air-fluorescence technique the exposure is more uniform in declination, but has also a 
dependence on right ascension (see e.g. [4]). 

3 Thc significance of these probabilities is a matter of debate [6, 7, 8], given its dependence on 
the size of the angular bin and the energy threshold. 
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Figure 1: Number of pairs (upper panel), doublets (middle) and triplets (lower) 
separated by an angle smaller or equal than 9. The solid lines follow the mean of 
10000 random realizations of 57 events isotropically distributed with AGASA's ex- 
posure. Error bars contain 95% of the results in the simulations. Points correspond 
to AGASA data above 4 x 10 19 eV. Significant departures of the data from random 
expectations are apparent at angular scales smaller than a few degrees in all three 
plots, as well as at scales between 9 and 14° in the distribution of doublets and 
triplets (but not in the number of pairs). 

The angular correlation of pairs is the lowest order of a hierarchy of correlation 
functions. Additional information on the clustering process is encoded, for instance, 
in the distribution of clusters with different multiplicities. The middle and lower 
panels in Figure 1 plot respectively the mean number of doublets, N 2 (< 6), and 
triplets, N 3 (< 6), in random realizations of 57 events with the AGASA exposure, 
compared with the AGASA data. The excess clustering at small angular scales in 
AGASA data is noticeable. The probability of forming four doublets at angular 
scales below 2.5° is ~ 0.06, and the probability of forming one triplet by chance is 
~ 0.016. The probability of forming 4 doublets and one triplet is of course even 
smaller than that of forming seven pairs, and is of order ~ 9 x 10 -4 . 

Notice that while the number of event pairs in AGASA data shows no significant 
departures from isotropic expectations at angular scales larger than a few degrees, 
their distribution in clusters of different multiplicities deviates noticeably from the 
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mean at angular scales roughly between 9° and 14°. The number of doublets is 
in defect and the number of triplets is in excess with respect to random isotropic 
realizations, at a probability level comparable to the small angular scale clustering 
signal. Spreading of arrival directions from point UHECR sources by magnetic 
fields and structure in their spatial distribution are among conceivable causes of 
such departures. We will come back to a more thorough characterization of this 
signal in section 4, with alternative statistical tools. 

One can get a feeling about the amount of clusters expected for different multi- 
plicities by considering the following simplified model to describe the way clusters 
are produced by chance coincidences of random isotropic data (a similar approach 
was adopted in ref. [9] to obtain the probabilities for observations of different num- 
bers of clusters). Let us first ignore the possible effects of a non-uniform exposure, 
and let us divide the total solid angle f2 monitored by the experiment (Q ~ 7.2 sr for 
AGASA when 8 max = 45° is considered) into N solid angle bins of characteristic size 
Ail ~ ti6 2 , so that N ~ Vt/AVL ~ 1050 (fi/27r)(2.5 /#) 2 for small angles 4 . Now, we 
start adding events up to a total number n (n = 57 for the published AGASA data 
to which we want to compare), assuming that each event has the same probability 
(j) = l/N) to fall into any of the independent solid angle bins. When two events fall 
into the same bin, this will then produce a doublet with characteristic separation 
smaller than 9. It is then simple to show that the average number of clusters N m (n) 
with a given multiplicity m (e.g. m = 2 for doublets, etc.) after observing n events 
satisfy 

N m {n + l) = (l-p) N m {n) + p N m ^{n) (4) 
and hence they are just given by a binomial distribution 

n \ m-l. 



Nm{n) = {' m )p m -\l-p) n - m . (5) 

In particular, one has 

— / lX™ -1 

Ni(n) —nil — —j ~ n exp(—n/N) (6) 

— n(n-l)(n-2) / 1 \"~ 3 n 3 

N M = J 2 1 1 - ^ ™ eM-n/N) (8) 



QN 2 V N J QN 2 

where the right-most expressions hold for n ^> 1. The exponential factors are 
actually only relevant when the total number of events becomes of the order of the 
number of available solid angle bins, since in this case most of the events appear in 
clusters and the multiplicity of the bins rapidly grows suppressing the number of low 
multiplicity clusters. Clearly by the time this will happen at small angular scales, 



l N = (fi/27r)/(l — cos#) when 6 is not small enough. 
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the discussion here will be redundant since already the statistics would have become 
large enough to allow the origin of the observed clustering to be clarified, and hence 
we will focus the discussion here mainly to the situation in which n <^ N. In this 
case we see that the number of pairs grows quadratically with the number of events 
observed (and hence with the experimental exposure), the number of triplets grows 
as the third power and so on, as expected. On the other hand, the ratio between the 
number of triplets and doublets is N 3 (n)/N 2 (n) ~ n/3N, and is hence very small 
as long as n <C N, as should be the case at present for AGASA. 

Hence, a generic prediction of clustering at small angular scales by chance co- 
incidences of an isotropic distribution is that one expects first to see a growing 
number of doublets (oc £ 2 , where S is the exposure achieved, and of course one 
expects n oc S) with no clusters of higher multiplicities (except for very unlikely 
fluctuations). Then, when the total number of events becomes ~ (6iV 2 ) 1//3 , triplets 
start to appear with a faster rate of growth (oc S 3 ). When this happens the number 
of doublets is typically iV 2 ~ iV 1 / 3 , which is already not a small number. We see 
then that it is not only surprising to have 4 doublets in the AGASA data when 
1.5 (~ 57 2 /2000) were expected, but it is probably even more surprising to have 
observed a triplet out of only 57 events. As an example, we show in Figure 2 the 
average results obtained from many random isotropic realizations, taking into ac- 
count a declination dependent exposure like that of AGASA, and plot the resulting 
number of pairs, doublets, triplets and quadruplets obtained (for 9 < 2.5°) as a 
function of the number of events (solid lines). The main features just mentioned 
that result from the simple model (dashed lines) are clearly apparent. 

The simple model predicts a smaller number of doublets with respect to the 
Monte Carlo results, what can be understood because the model with solid angle 
bins requires that each event be asigned to a unique cluster, and thus does not include 
properly all possible configurations. Consider as an example the case of three events 
aligned that lead to two doublets but not a triplet 5 . In this case, one of the events 
participates in two different doublets, but clearly no event can fall in two different 
angular bins simultaneously. The simple model also predicts an excess (at relatively 
small n) of triplets and quadruplets with respect to the Montecarlo results. This 
can be understood because once two events are separated by an angle < 9, to form a 
triplet the third event has to fall within a solid angle smaller, on average, than tt9 2 . 
Hence, the effective number of bins that define the probability of higher multiplicity 
clusters is larger than the value of N used to calculate the probability of forming 
doublets. At relatively larger n, when clusters of multiplicity m + 1 start to form, 
the simple model predicts instead a defect of clusters of multiplicity m, for reasons 
analogous to those that explain the defect in doublets. Notice that this should also 
affect the probabilities obtained in [9]. The limitations of the simple model appear 
at smaller values of the number of events n when larger angular separations are 
considered, since the number of bins N becomes significantly smaller. 

5 Alignment of events can be caused by energy-dependent deflections in magnetic fields, which 
makes even more relevant to properly discriminate such a clustering signal. 
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Figure 2: Mean number of pairs (thick solid line), doublets, triplets and quadruplets 
(thin solid lines) in simulated realizations of n events randomly distributed with 
AGASA's declination-dependent exposure. The result of the simple analytic model 
of eq. (5) for doublets (m=2), triplets (m=3) and quadruplets (m=4) is plotted with 
dashed lines. 

Another property of the multiplets that result from chance coincidences of an 
isotropic distribution is that, once we take into account that the relative expo- 
sure is proportional to u>(5), and hence that the number of observed events should 
follow dn/d5 oc u(5) cos 5 (with the cos 5 factor just from the solid angle associ- 
ated to 5), the distribution of event pairs on the sky should be proportional to 
dN p oc uj(5) 2 cos<5d<5, and similarly the number of triplets (including those in quadru- 
plets and higher multiplets) should behave as dN t oc u(S) 3 cos<5d<5, and so on. Hence, 
the high multiplicity clusters should appear more significantly concentrated at de- 
clinations close to the latitude of the experiment, where u(5) is maximal. This also 
implies that taking into account the declination dependence of the exposure will 
slightly enhance the ratio N 3 /N 2 (as well as the value of N 2 itself) with respect to 
the estimates done neglecting u(S). We show in Figure 3 the observed distribution 
(solid histogram) of events and of event pairs (dashed histogram) in AGASA data, 
together with the expectations from chance coincidences out of an isotropic distri- 
bution. It is clear that although the angular distributions are consistent with the 
hypothesis of isotropic sources, there is an overall excess in the number of pairs with 
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Figure 3: Distribution in declination of AGASA events (solid histogram) and 
AGASA event pairs (dashed histogram) compared to that expected from chance 
coincidences of an isotropic distribution with AGASA's declination dependence of 
the exposure (solid and dashed curves). 

respect to expectations from chance coincidences. 

2.2 Structured sources 

As is well known, the matter distribution in the local Universe is rather inhomo- 
geneous, specially if we consider the neighbourhood where trans-GZK events may 
originate (distances < 100 Mpc). In this region indeed large voids are found, and 
most of the galaxies are distributed on large scale structures like the great attractor, 
the supergalactic plane, etc., which only cover a relatively small fraction of the sky 
(see e.g. Figure 10 in [1]). If UHECR sources were to follow a similar distribu- 
tion, this can have a significant impact on the statistics of the clustering produced 
by chance overlaps among different sources. For instance, if the sources were dis- 
tributed uniformly over the sky but 'avoiding voids', so that the effective solid angle 
where sources can be found is smaller than Q, one could repeat the reasoning of the 
previous section and find similar expressions but now with a smaller number N' of 
solid angle bins where clusters could be formed. More generally, if we substitute 
in the expressions of the previous section n/Q = p, which would be the density 



8 



of events per unit solid angle, it is clear that in the limit in which pAf2 <C 1 the 
generalization of the expression for the number of doublets for the case in which the 
density of sources is not uniform becomes 



N 2 {n) 



An 




(9) 
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If we now define the average density of events as 



(10) 



we obtain 



N 2 (n)~W 2 



so 



(n)((p/p) 2 ), 



(11) 



where 




(12) 



is the average number of doublets expected from an isotropic distribution (the N 2 
computed in the previous section), while 



is a quantity bigger than unity. If one assumes that the distribution of UHECR 
sources somehow follows the distribution of visible matter, one expects that if the 
sources are indeed nearby (at least for trans-GZK events) there should be a signifi- 
cant non- uniformity in the arrival directions of the observed events, probably leaving 
void regions in the sky and with the events finally falling, once larger statistics will 
be attained, along large scale structures reminiscent to strips across the sky (at these 
energies, deflections due to magnetic fields are not expected to be large, at least for 
CR protons). Notice that due to the limited statistics accumulated so far, such a 
picture is not inconsistent with current observations. 

As a simple example, suppose that only a fraction £ of the observed sky has a uni- 
form distribution of sources, while the rest consists of void regions. This would lead 
to p = p/£ in the regions with sources and hence ((p/p) 2 ) = l/£, which shows that a 
significant enhancement in the number of pairs can be achieved by distributing the 
sources anisotropically. This is understood since in the higher density regions the 
probability of forming a doublet by chance coincidences is largely enhanced (oc p 2 ). 
An enhancement in the number of doublets was indeed reported in numerical simu- 
lations with a structured source distribution [10]. An even more pronounced effect 
should also result for the probability of formation of triplets, which can similarly be 
shown to behave as 




(13) 




(14) 
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3 Clustering from individual sources 



The alternative explanation that has been suggested for the observed clustering is 
that the events in each multiplet originated in the same sources, so that the doublets 
and triplets observed would correspond to the first manifestations of the brightest 
sources in the sky. In this scenario, the clusters produced by chance coincidences 
discussed in the previous section would just be a background for the signal searched, 
and the excess observed should be attributed to individual sources. 

Given a source with a certain flux F (for energies above some specified threshold), 
the probability that it gives rise to a certain number m of events follows a Poissonian 
distribution [11] 



Tfl 

P m (F) = — e~ m (15) 
ml 

where the mean multiplicity is m = FS, with £ being the experimental exposure 
towards the direction of the source (for definiteness we will assume that the exposure 
is uniform over the whole sky, and just comment later on the effects of non-uniform 
exposures). Hence, if dn s /dF denotes the number density of sources leading to a 
flux between F and F+dF, the average number of multiplets expected will be [12] 

roc Am 

N m = J o dF ^Pm(F) (16) 

On the other hand, the probability of finding k clusters of multiplicity m will 
also be Poisson distributed, i.e. 

7v fc — 

P m (k) = e~ N ™. (17) 

Regarding the source distribution function, the simplest situation would be to 
assume a uniform distribution of sources with similar intrinsic CR luminosities L. 
In this case, if we further neglect for simplicity the effects of energy losses, the 
sources leading to a flux larger than F would just be those at a distance closer than 
what gives 

4 / L \ 3 / 2 

ns{> F) = r ps k^f) ' (18) 

with p s the source volume density. Hence, in this case one finds 

^ oc F~^. (19) 

Notice that this power law behaviour would lead to a divergent total flux if 
extrapolated down to very small fluxes (this is nothing else than Olber's paradox), 
and it is clear that CR attenuation during propagation as well as the effects of the 
universe expansion and source evolution will ultimately smooth out the faint end 
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of the source distribution. The simplest way to cure this is to just introduce a 
lower cutoff flux F min below which no sources contribute, and since we are mainly 
interested on the effects of the bright end of the distribution, our main conclusions 
will be independent of the particular way in which this cutoff is handled. 

Another simple scenario which has been considered is that the CR sources are 
transient, such as in the models of acceleration during GRBs. In this case, due to the 
energy dependent time delays r(E, d) suffered by CRs after traversing a distance d 
across a magnetised universe, a bursting source will be seen (neglecting energy losses 
and strong lensing effects) as a monochromatic flux at the observer, with an energy 
slowly decreasing with time but on a scale typically much longer than the duration 
of the experiments [13]. However, stochastic energy losses during CR propagation 
introduce a spread in time delays Ar(d, E) which leads to the observation of CRs 
with a certain spread of energies at any given time, with AE/E ~ At/t. Moreover, 
when strong magnetic lensing effects are present, so that a CR source is actually 
observed as a very large number of multiple images associated to different alternative 
paths from the source to the observer, this also introduces a spread in the energies of 
the particles observed at any given time. There are two ways in which these processes 
affect the number of sources above a given flux 6 which can be observed at any given 
moment. First, the longer is the spread in time delays, the more likely it becomes for 
a source to contribute to the CR fluxes during the period of observation, but however 
since the CRs produced in the burst will spread over a longer time interval, the flux 
will be more suppressed than the naive d~ 2 behaviour. As argued in ref. [13], one 
expects At oc r oc (d/E) 2 , and hence one has that F oc l/(d 2 Ar) oc d~ 4 . Hence, the 
number of sources above a certain flux (i.e. within an associated distance d oc F^ 1 / 4 ) 
will be [13] 

n s (> F, E) oc \nd z At oc F~ 5/4 . (20) 
This gives a differential source density 

J? oc F-»l\ (21) 

In view of the previous results, we may then parametrise the source distribution 
with a simple power law 

dn s _ aN s (F min \ a+1 (00 , 
dF ~ F V F ) ' 1 ' 

ul 1 mm V 1 / 

with a lower cutoff at the minimum flux F m i n and where N s is the total number 
of sources above F min . For the steady source model one has a — 3/2 while for the 
bursting scenario a = 5/4. For these simple scenarios one obtains, from eqs. (16) 
and (17), an expected average number of multiplets 

<x/V 

N m = — -T(m - a, F min 8) (F mm S) a (23) 
ml 



3 For transient sources, the flux should be considered in a given energy bin around E [13]. 
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with T(x, y) the incomplete Gamma function. Since we are interested in cases where 
1 < a < 2, one finds that as long as £ <C F~l n (i.e. that the faint sources have 
a small probability of leading to one event) the expected average number of event 
clusters (with m > 2) will have very little sensitivity to the second argument of 
the Gamma function, leading to N m oc S a T(m — a) /ml, while the number of single 
events (not in clusters) does depend on the faint end of the source distribution 7 , hav- 
ing the approximate behaviour N 1 ~ aN s £F min /(a — 1). Clearly these approximate 
behaviours cease to be valid when £ ~ F~] n , but before that happens (as should 
be anyway the case for the next few years) one has that while the average number 
of individual events (which will be the majority) should grow approximately like £, 
the number of clusters grows at a faster rate N m oc S a . Of course the total number 
of events is expected to grow strictly proportionally to the exposure, as results from 
Eq. (16), since n = J2 mN m = F tot £, with F tot the total flux. The fraction of events 
which are in multiplets (neglecting here chance coincidences) grows hence, as long 
as S <C F min , as S a 1 . 

It is also interesting to compute the ratio of expected number of clusters with 
different multiplicities, 



Ni ml r(Z-a) 



N m l\ r(m-a) 
and in particular for I — m + 1 one obtains 

N m+1 (m - a) 



N m (m+1) 
Another related result is that 



(l,m>2), (24) 



(m > 2). (25) 



ss^^ga^) (26) 

iV 2 m=3l=3 1 

Notice that these ratios are independent of the total number of events n, contrary 
to the case of chance coincidences. On the other hand, the dependence on the 
distribution of the source fluxes is rather noticeable, for instance, for a = 3/2 the 
ratio in eq. (26) is 0.33, while for a = 5/4 it becomes instead 0.60. 

All these statistical quantities do not depend on whether the angular distribution 
of sources in the sky is isotropic or structured, as long as the average radial distri- 
bution does not differ much from uniformity. Of course fluctuations would affect 
these average behaviours, and they could come from a spread in the source luminosi- 
ties, which could be either intrinsic or caused by magnetic lensing effects [17] (this 

7 Sincc our treatment of the faint end has been rather crude, one should not use statistics such 
as the number of single vs. double events to get information on the number of sources contributing, 
and in this respect the parameter N s just comes out as a normalisation constant. An estimate of 
the number of sources producing CRs depends sensitively on the way the faint end is handled, and 
will require other general features of the CR sources, such as the luminosity distribution, to be 
understood as well [12, 14, 15, 16]. 
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Figure 4: Distribution in declination of cluster multiplicity (left panel) and number 
of clusters (right panel) in AGASA data (histograms) compared to that expected 
from an isotropic distribution of stationary sources (a = 3/2, see text), for which 
the overall normalization adopted is just arbitrary (solid curves). 

could typically produce some clusters of larger than expected multiplicities), from 
fluctuations in the radial source density distribution, from small statistics, etc.. 

Regarding the effects of varying exposures, it is clear that in the regions of 
larger exposures the sources are expected to yield clusters of larger multiplicities 
on average, so that one expects that with large statistics, an uniform distribution 
of sources would lead to an average multiplicity varying as u(5) a . On the other 
hand, the number of clusters (with multiplicities m > 2), i.e. N d = J2m>2^m, 
should follow the distribution dA c ; oc u(5) a cos<5d<5, and although the data are still 
scarce, the observed clusters are consistent with these behaviours, as illustrated in 
Figure 4. Notice that the total number of events (in clusters or not) should follow 
dn oc u>(6) cos5d5 if the source distribution is isotropic, so that a deviation from this 
behaviour would be a clear signal of anisotropies, no matter what is the underlying 
reason for the clustering observed (i.e. chance coincidences vs. strong sources). 
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4 Nearest neighbour distribution 



Up to now we have discussed in detail the properties of the distribution of multi- 
plets in the different possible clustering scenarios. One of the main criticisms to the 
statistics discussed has been the dependence on the angular binning of the data, as 
the results depend on its choice. We will discuss now another statistics to analyse 
small scale clustering, the frequency distribution of the angular distance to the near- 
est neighbours of each event, which avoids this problem. Each event has a first, a 
second, and so on, nearest neighbour event, and the idea is to look at the distribu- 
tion of their distances. This statistics has some advantages over the autocorrelation 
function of pairs when studying non-linear clustering, as it depends on the corre- 
lation functions of all orders. This is because the probability of finding no events 
at a distance smaller than a given angle depends on the correlation functions of all 
orders. This method has been used for example by Bahcall and Soneira to study 
the distribution of stars and the fraction of binaries and triples [18]. 

The cumulative distributions of the distance to the first and second neighbour 
for the 57 events of AGASA as a function of the angular distance are shown in 
the solid histogram of Figure 5. The expected distribution for a sample of random 
points in the sphere has been computed by Scott and Stout [19]. However, in order 
to compare with the distribution of the AGASA data we have to take into acount 
the incomplete sky coverage and inhomogeneous exposure. Thus, we used instead 
the results of 10000 simulations of 57 random events with the AGASA exposure, 
the mean of which is shown in Figure 5 (dashed line). The AGASA distribution 
shows an excess of first neighbours at small angles, with a maximal departure at 
2.5°, with the excess disappearing at angles slightly larger than 5°. This excess is 
clearly related to the excess of doublets and triplets at small angles discussed before. 
The cumulative histogram of the second neighbours (dotted histogram) also shows 
an extended excess with respect to random expectations (dash-dotted line) from 
angles around 2.5° up to 15°, with a maximal deviation at 11°. 

In order to have a measure of the significance of the first and second neighbour 
excesses, the natural choice would be to perform a Kolmogorov-Smirnov test. How- 
ever, a necessary condition for this test is that the data be all independent, and the 
nearest neighbour separations between events are not independent variables (since 
often the nearest neighbour of an event has as nearest neighbour that event). Thus, 
we determine the significance of a given deviation D of the cumulative histogram 
from that expected from a random sample of the same number of events using Mon- 
tecarlo simulations and look for the fraction of the simulations with a deviation 
from the mean larger than the observed D. The first neighbour distribution shows a 
maximum deviation from the mean of the random simulations D = 0.146, with 6% 
of the simulations having larger deviations at any angle, while the second neighbours 
curve maximum deviation is D = 0.246, with 0.8% of the simulations showing larger 
deviations. 

We have also checked the deviation of the mean angular separation to the first 
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Figure 5: Cumulative distribution of the distance to the first (solid histogram) and 
second (dotted histogram) neighbour of the 57 AGASA events above 4 x 10 19 eV. 
The dashed and dashed-dotted curves are the mean distributions for 10000 random 
realizations of 57 events with AGASA exposure. 



and second neighbour of the data from that expected for a random sample. While 
the AGASA data have first neighbour mean distance {Q\)ag — 9.66°, the mean for a 
random sample is (0i) ran = 10.18°, with 25% of the simulations having values smaller 
than the AGASA one. For the second neighbours we find (62) ag — 13.57°, while 
(^2)™ = 15.35°, with only 2.3% of the simulations having smaller than AGASA's 
mean value. 

Let us note that the maximum deviation of the second neighbour distance cu- 
mulative distribution, that takes place at 6 ~ 11°, reflects the clustering at all 
smaller scales. In particular, the effect of the triplet is clearly seen in the rise of 
the histogram at 2.5°, which produces a shift in the whole curve. Also the excess 
of doublets at small angles is expected to affect the second neighbours distance 
distribution: the events entering in the doublets will have their second neighbours 
at the mean distance at which singlets have their first neighbours. Then, we may 
ask if the whole deviation of the second neighbours at 11° can be explained as a 
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Figure 6: Same as Figure 5, but now the simulations have 53 events randomly 
thrown with AGASA exposure, and the remaining 4 events are forced to form one 
triplet and two doublets within 2.5° of the rest. 



result of the excess of small scale clustering (triplet and doublets). To answer this, 
we performed a similar analysis, comparing the data with 10000 simulations of 57 
events, 53 of which are randomly thrown with the AGASA exposure, and the rest 
are forced to form a triplet and two doublets within 2.5° with some of the other 
events (chance alignments of random points are expected to produce 1.5 additional 
doublets on average at 2.5°). The resulting cumulative distributions for the first and 
second neighbours distance are shown in Figure 6. The first neighbours distribution 
of the AGASA data are now in good agreement with the result of the simulations, as 
well as the small angle piece of the second neighbours distribution. There remains 
however a deviation of this distribution with a maximum distance D = 0.187 at 
11°, with 7.6% of the simulations showing larger deviations. The mean distance 
to the second neighbour of the simulation is now reduced to (0 2 ) r an = 14.5°, with 
15% of the simulations showing values smaller than the AGASA's one. While the 
significance of the deviation is reduced by forcing the small scale clustering in the 
simulations, it remains however a hint of extra clustering at angular scales around 
10° that may be due to point source events that have been deflected by magnetic 
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fields, or to structure in the spatial distribution of sources. 

5 Discussion 

We have seen that the statistics of clusters of events with different multiplicities is a 
very useful tool that may help characterise the nature of UHECR sources. We have 
analysed the way in which the relative number of clusters with different multiplici- 
ties, their change with increasing exposure, and their distribution with declination 
can help discriminate among alternative source scenarios. In particular, one of the 
most striking differences between the individual sources and chance coincidences 
scenarios for clustering is the expected hierarchy of the number of clusters with dif- 
ferent multiplicities. For example, the ratio between the number of triplets or higher 
multiplets and doublets is ~ 1/3 for the steady source model (or 0.6 for the bursting 
one), while it is only ~ n/3N for the chance coincidences, where N ~ 10 3 (2.5°/#) 2 . 
In case the clusters are dominated by individual sources, we see that with significant 
statistics this may even give a handle to discriminate between bursting and steady 
source scenarios. Moreover, the evolution of this ratio (which is proportional to the 
exposure for chance coincidences, while independent of it for individual sources) will 
give a further tool to discriminate among scenarios. On the other hand, the relative 
fraction of events in clusters (with respect to the unclustered ones) is only ~ n/2N 
for the chance coincidences of an isotropic distribution, but could be significantly en- 
hanced if the distribution of sources is anisotropic. This should be however testable 
by studying the overall event distribution once larger statistics are achieved. In 
the case of clusters produced by individual sources, this fraction depends instead 
sensitively on the amount of very faint sources present. 

We have applied our analysis to the published set of 57 AGASA events above 4 x 
10 19 eV just for illustrative purposes, since it is clearly insufficient to draw definitive 
conclusions. Nevertheless, our results indicate that a modest increase in the number 
of events may already give much stronger hints about the source properties. In 
this respect, even the release of the latest AGASA data, which represents a 50% 
increase with respect to the published data, could have some impact. For instance, 
it would be natural to expect that the triplet becomes a quadruplet, and some 
doublets become triplets, if they are due to individual sources, while this would be 
very unlikely if they arose from chance coincidences. 

We have also applied an alternative statistical tool to analyse UHECR clustering: 
the nearest neighbour distribution. It has the advantage over the pair autocorrela- 
tion method of being less sensitive to binning, and that it depends not just on the 
pair correlations but on correlations to all orders [20]. Applied to AGASA data, it 
gives an alternative measure of the small angular scale clustering, and also points out 
to an excess of clustering with respect to random expectations which persists up to 
15°, that manifests in the second neighbours cumulative distribution (Figure 5), and 
which is also hinted by the defect of doublets and excess of triplets at scales up to 
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15° in Figure 1. Angular spread of events in a cluster due to deflections by magnetic 
fields, and structure in the spatial distribution of sources, are among conceivable 
causes of such signal, which should be looked upon with better statistics. 
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